Inference for a difference in means: Two-sample t-test

DATAX121-23A (HAM) & (SEC) - Introduction to Statistical Methods

Learning Outcomes

  • Quantifying the uncertainty for the difference between two sample means from independent samples
  • How to construct and interpret a confidence interval for the difference between two population means
  • How to conduct and interpret a hypothesis test for the difference between two population means

12 from two independent samples

From one-sample to two-sample

Recall that one assumption for inference on the population mean, \(\mu\), was that it was unimodal—as this was an indicator that one measure of centre was appropriate for the data

If this assumption was not met, often, the most logical explanation was that a categorical variable was not measured

Hence, we often collect more than the variable of interest when carrying out observational studies and conducting (randomised) experiments

Let’s start with the scenario where we measured the same numeric variable for two independent groups

Simulating random samples of two populations

Sampling distribution of 12

If both population means, \(\mu_1 ~ \& ~ \mu_2\), and both population standard deviations, \(\sigma_1 ~ \& ~ \sigma_2\), are known—The ground “truths” (parameters) that summarise all possible values we could observe

The sampling distribution of the difference between two sample means, \(\bar{x}_1 - \bar{x}_2\), is

\[ \bar{x}_1 - \bar{x}_2 ~ \text{approx.} ~ \text{Normal} \! \left(\mu_{\bar{x}_1 - \bar{x}_2} = \mu_1 - \mu_2, \sigma_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{(\sigma_1)^2}{n_1} + \frac{(\sigma_2)^2}{n_2}} \right) \]

The use of the \(\bar{x}_1 - \bar{x}_2\) subscripts is to make it clear that we are talking about the sampling distribution of \(\bar{x}_1 - \bar{x}_2\) and not the possible values we could observe

Assumptions for inference on μ1μ2

  1. Two independent groups
  2. Within group: Independent observations
  3. Within group: Unimodal—one peak
  4. Within group: Approximately symmetrical about the group’s sample mean, \(\bar{x}_i\), and there are no outliers—Like the one-sample t-test, we are more lenient about this assumption as \(n\) becomes larger

More on 1.

  • Random samples meet this randomly selecting observations from the populationStratified samples does ensure that both groups are sampled
  • Randomised experiments meet this by randomising the levels of the categorical explanatory variable

Definition: se(12)

The standard error of the difference between sample means, \(\bar{x}_1 - \bar{x}_2\), is

\[ \text{se}(\bar{x}_1 - \bar{x}_2) = \sqrt{\frac{(s_1)^2}{n_1} + \frac{(s_2)^2}{n_2}} \]

where:

  • \(s_1\) is the sample standard deviation1 of the numeric response variable for the first group
  • \(s_2\) is the sample standard deviation1 of the numeric response variable for the second group
  • \(n_1 ~ \& ~ n_2\) are the number of observations in the first and second groups, respectively

CS 1.1 revisited: NZ income snapshot in 2011

Do Aucklanders, on average, have the same gross weekly income as Wellingtonians in June quarter of 2011?

Are all four assumptions met?

# Reading in the data, then subset it to choose only Aucklanders
  # and Wellingtonians
nzis.subset <- read.csv("datasets/NZIS-CART-SURF-2011.csv") |>
  subset(region == "Auckland" | region == "Wellington")
histogram( ~ income | region, data = nzis.subset, nint = 50, 
          type = "count", xlab = "Gross Weekly Income ($)",
          main = "NZer's gross weekly income snapshot in 2011")

Figure: The gross weekly income of 12518 New Zealanders

CS 1.1 revisited: NZ income snapshot in 2011

Do Aucklanders, on average, have the same gross weekly income as Wellingtonians in June quarter of 2011? -9.3062 290.7690458 17.0519514

# Use split() to report the sample means, sample standard 
  # deviations and the number of observations for each group
split(nzis.subset, ~ region) |>
  lapply(\(x) mean(x$income))
$Auckland
[1] 720.1581

$Wellington
[1] 729.4643
split(nzis.subset, ~ region) |>
  lapply(\(x) sd(x$income))
$Auckland
[1] 885.168

$Wellington
[1] 840.5936
split(nzis.subset, ~ region) |>
  lapply(\(x) nrow(x))
$Auckland
[1] 9059

$Wellington
[1] 3459

A confidence interval for μ1μ2

Definition: (1 - α)% Confidence interval for μ1μ2

\[ \bar{x}_1 - \bar{x}_2 \pm t^*_{1-\alpha/2}(\nu) \times \text{se}(\bar{x}_1 - \bar{x}_2) \]

where:

  • \(\bar{x}_1\) is the sample mean of the first group
  • \(\bar{x}_2\) is the sample mean of the second group
  • \(n_1 ~ \& ~ n_2\) are the number of observations in the first and second groups, respectively
  • The confidence level is \((1 - \alpha)\), where \(\alpha\) is a proportion
  • The degrees of freedom, \(\nu\), where software will calculate this for us—see Slide 28
  • \(t^*_{1-\alpha/2}(\nu)\) is the t-multiplier for the prescribed confidence level of \((1 - \alpha)\)
    • For example, a confidence level of 90% results in \(t^*_{0.95}(\nu)\)
  • \(\text{se}(\bar{x})\) is the standard error of \(\bar{x}_1 - \bar{x}_2\)—see Slide 8

CS 1.1 revisited: NZ income snapshot in 2011

Do Aucklanders, on average, have the same gross weekly income as Wellingtonians in June quarter of 2011?

Construct and interpret a 95% confidence interval for the difference between population average gross weekly incomes of Aucklanders and Wellingtonians to answer this question

From Slide 10:
\(\phantom{\bullet} \bar{x}_{A} - \bar{x}_B = -9.3062\)
\(\phantom{\bullet} \text{se}(\bar{x}_{A} - \bar{x}_B) = 17.05 ~ (2 ~ \text{dp})\)

The appropriate t-multiplier is:
\(\phantom{\bullet} t^*_{0.975}(6557.4) = 1.96\)

The 95% CI for \(\mu_A - \mu_B \approx\)
\(\phantom{\bullet} (-42.7242, 24.1118)\)

Notes on interpreting a C.I. for μ1μ2

  • A positive number, \(\color{tomato}+\), corresponds to the population/underlying mean of the first group is greater than that of the second group
  • A negative number, \(\color{steelblue}-\), corresponds to the population/underlying mean of the first group is lower than that of the second group
  • You can “invert” the signs of the C.I. to interpret it instead as one for \(\mu_2 - \mu_1\)
    Recall that the 95% CI for \(\mu_A - \mu_B\) was \((-42.7242, 24.1118)\)

A hypothesis test for μ1μ2

Also, known as the two-sample t-test (for μ1μ2)

Hypothesis statements for μ1μ2

The material for one numeric variable and two groups (one categorical variable) presents it as \(\mu_1 - \mu_2\). Why…?1

Let’s consider an abstract set of null and alternative hypothesis statements

\(\phantom{\bullet} H_0 \! : \mu_1 = \mu_2\)
\(\phantom{\bullet} H_1 \! : \mu_1 \neq \mu_2\)

As is, it is not intuitive on how we would specify a hypothesised difference between the two means, \(\mu_1 ~ \& ~ \mu_2\)

Let’s consider the following abstract set of null and alternative hypothesis statements

\(\phantom{\bullet} H_0 \! : \mu_1 - \mu_2 = 0\)
\(\phantom{\bullet} H_1 \! : \mu_1 - \mu_2 \neq 0\)

We can now specify a hypothesised difference between the two means, \(\mu_1 ~ \& ~ \mu_2\)

Definition: The test statistic for μ1μ2

\[ t_0 = \frac{(\bar{x}_1 - \bar{x}_2)- \text{Diff}_0}{\text{se}(\bar{x}_1 - \bar{x}_2)} \]

where:

  • \(t_0\) is the T-test statistic (for μ1μ2)
  • \(\bar{x}_1\) is the sample mean of the first group
  • \(\bar{x}_2\) is the sample mean of the second group
  • \(\text{Diff}_0\) is the hypothesised difference between the population means \(\mu_1 - \mu_2\)
  • \(\text{se}(\bar{x})\) is the standard error of \(\bar{x}_1 - \bar{x}_2\)—see Slide 8

Calculation of the p-value (for μ1 - μ2)

Let \(T\) be the Student’s t-distribution with \(\nu = \ldots\), see Slide 28

  • \(\nu\) is the Student’s t-distribution’s degrees of freedom parameter

If it is a two-sided test, e.g. \(H_1 \! : \mu_1 - \mu_2 \neq \text{Diff}_0\)

\(\quad p\text{-value} = 2 \times \mathbb{P}(T > |t_0|)\)

If it is a one-sided test and \(H_1 \! : \mu_1 - \mu_2 > \text{Diff}_0\)

\(\quad p\text{-value} = \mathbb{P}(T > t_0)\)

If it is a one-sided test and \(H_1 \! : \mu_1 - \mu_2 < \text{Diff}_0\)

\(\quad p\text{-value} = \mathbb{P}(T < t_0)\)

CS 1.1 revisited: NZ income snapshot in 2011

Do Aucklanders, on average, have the same gross weekly income as Wellingtonians in June quarter of 2011?

Conduct and interpret a hypothesis test at the 5% significance level to answer this question

From Slide 10:
\(\phantom{\bullet} \bar{x}_{A} - \bar{x}_B = -9.3062\)
\(\phantom{\bullet} \text{se}(\bar{x}_{A} - \bar{x}_B) = 17.05 ~ (2 ~ \text{dp})\)

Hypothesis statements:
\(\phantom{\bullet} H_0\!: \mu_A - \mu_B = 0\)
\(\phantom{\bullet} H_1\!: \mu_A - \mu_B \neq 0\)

The test statistic is:
\(\phantom{\bullet} t_0 \approx -0.54 ~ (2 ~ \text{dp})\)

The appropriate t-multiplier is:
\(\phantom{\bullet} t^*_{0.975}(6557.4) = 1.96\)

Notes on interpreting a hypothesis test for μ1μ2

  • A positive number, \(\color{tomato}+\), for \(\text{Diff}_0\) means that you are testing that the population/underlying mean of the first group is greater than that of the second group
  • A negative number, \(\color{steelblue}-\), for \(\text{Diff}_0\) means that you are testing that the population/underlying mean of the first group is lower than that of the second group
  • A zero, \(0\), for \(\text{Diff}_0\) means that you are testing that the population/underlying mean of the first group is the same as the second group

Case studies

CS 5.1: Thiol concentration

A group of researchers in 1982 noted that thiol concentrations within human blood cells are seldom determined in clinical studies, in spite of the fact that they are believed to play a key role in many vital processes. They reported a new reliable method for measuring thiol concentration (in mmol) and demonstrated that, in one disease at least (rheumatoid arthritis), the change in thiol status in the lysate from packed blood cells is substantial.

There were two groups of volunteers, the first group sampled from a population with “normal” thiol concentrations and the second group sampled from those who have rheumatoid arthritis.

Variables
concent A number denoting the thiol concentration (in mmol)
type A factor denoting the population the observation belonged to, Normal or Rheumatoid
thiol.df <- read.csv("datasets/thiol.csv")
stripplot(type ~ concent, data = thiol.df, cex = 1.25,
  jitter.data = TRUE, xlab = "Thiol concentration (mmol)", 
  main = "Distribution of thiol concentration by population")

Figure: The thiol concentrations of 13 volunteers by population

Are all four assumptions met?

CS 5.1: Thiol concentration

Construct and interpret a 99% confidence interval for the difference in the average thiol concentrations between the “normal” and rheumatoid arthritis populations.

You may use the fact that the t-multiplier’s value for \(\nu\) is approximately \(5.2528\)

With 99% confidence, we estimate that the true mean thiol concentration for the rheumatoid population exceeds that of the normal population by somewhere between 0.83 and 2.26 mmol

# Calculate & save the necessary descriptive statistics
xbars <- split(thiol.df, ~ type) |> 
  lapply(\(x) mean(x$concent))
sds <- split(thiol.df, ~ type) |> 
  lapply(\(x) sd(x$concent))
ns <- split(thiol.df, ~ type) |> 
  lapply(\(x) nrow(x))

# Assign the t-multiplier
t.mult <- qt(0.995, df = 5.2528)
t.mult
[1] 3.934093
# Implementing the equation from Slide 11
estimate <- xbars$Normal - xbars$Rheumatoid
se <- (sds$Normal^2 / ns$Normal + sds$Rheumatoid^2 / ns$Rheumatoid) |>
  sqrt()

estimate + c(-1, 1) * t.mult * se
[1] -2.2599077 -0.8272352

CS 5.1: Thiol concentration

Conduct a hypothesis test at the 1% signficance level to detect if the average thiol concentrations of the rheumatoid arthritis population is exceeds that of the “normal” population.

You may use the fact that the Student’s t-distribution’s \(\nu\) parameter is approximately \(5.2528\)

\(H_0\!: \mu_R - \mu_N = 0\)
\(H_1\!: \mu_R - \mu_N > 0\)

We have very strong evidence against the null that the average thiol concentrations of the rheumatoid arthritis and “normal” populations are equal, in favour of the alternative that the average thiol concentrations of the rheumatoid arthritis population is greater than that of the “normal” population (p-value = 0.0001)

# Refer to Slide 23 for initial calculations
# Implementing the equation from Slide 17
estimate <- xbars$Rheumatoid - xbars$Normal
se <- (sds$Rheumatoid^2 / ns$Rheumatoid + sds$Normal^2 / ns$Normal) |>
  sqrt()

t0 <- (estimate - 0) / se
t0
[1] 8.477239
# Implementing the p-value calculation from Slide 18
pt(t0, df = 5.2528, lower.tail = FALSE)
[1] 0.0001468314

CS 5.2: “The Pen is Mightier Than the Keyboard!”

A study randomly assigned students to take notes either longhand or using a laptop. The researchers had the students take a test after they wrote their notes. Does the data provide evidence of a difference in taking notes longhand rather than on a laptop?

Variables
score An integer denoting the test score (unitless)
method A factor denoting the note taking method, longhand or laptop
notes.df <- read.csv("datasets/notes.csv")
histogram( ~ score | method, data = notes.df, 
  xlab = "Test score (unitless)", type = "c",
  main = "Distribution of test scores by note taking method")

Figure: The test scores of 78 students by note taking method

Are all four assumptions met?

CS 5.2: “The Pen is Mightier Than the Keyboard!”

Use the t.test() function to construct a 95% confidence interval for the difference in the mean test score between the longhand and laptop note taking methods.

# Let's use the t.test() function for a two-sample t-test
t.test(score ~ method, notes.df, conf.level = 0.95)

    Welch Two Sample t-test

data:  score by method
t = -2.9773, df = 74.31, p-value = 0.003924
alternative hypothesis: true difference in means between group laptop and group longhand is not equal to 0
95 percent confidence interval:
 -10.592821  -2.099284
sample estimates:
  mean in group laptop mean in group longhand 
              19.07500               25.42105 

CS 5.2: “The Pen is Mightier Than the Keyboard!”

R, by default, typically organises the levels of categorical variables in alphabetical order. To manually change the order, we need to make use of the factor() function

# Create a new variable to rotate the group order
notes.df$method.new <- factor(notes.df$method, levels = c("longhand", "laptop"))

# Let's use the t.test() function for a two-sample t-test
t.test(score ~ method.new, notes.df, conf.level = 0.95)

    Welch Two Sample t-test

data:  score by method.new
t = 2.9773, df = 74.31, p-value = 0.003924
alternative hypothesis: true difference in means between group longhand and group laptop is not equal to 0
95 percent confidence interval:
  2.099284 10.592821
sample estimates:
mean in group longhand   mean in group laptop 
              25.42105               19.07500 

Equation Reference: ν for a two-sample t-test

The Student’s t-distribution is an approximation for the sampling distribution of all possible test statistics, \(t_0\), for the two-sample t-test taught in DATAX121 (Wild & Seber, 2000)

Interestingly, the Student’s t-distribution is a very good approximation if the degrees of freedom parameter, \(\nu\), is set to the following:

\[ \nu = \frac{\left\{ \frac{(s_1)^2}{n_1} + \frac{(s_2)^2}{n_2} \right\}^2}{\frac{1}{n_1 - 1} \left\{\frac{(s_1)^2}{n_1}\right\}^2 + \frac{1}{n_2 - 1} \left\{\frac{(s_2)^2}{n_2}\right\}^2} \]

This equation for \(\nu\) is commonly known Sattherwaite’s approximation